智能论文笔记

Towards a methodology for addressing missingness in datasets, with an application to demographic health datasets

Gift Khangamwa , Terence L. van Zyl , Clint J. van Alten

分类：机器学习 | 人工智能

2022-11-05

Missing data is a common concern in health datasets, and its impact on good decision-making processes is well documented. Our study's contribution is a methodology for tackling missing data problems using a combination of synthetic dataset generation, missing data imputation and deep learning methods to resolve missing data challenges. Specifically, we conducted a series of experiments with these objectives; $a)$ generating a realistic synthetic dataset, $b)$ simulating data missingness, $c)$ recovering the missing data, and $d)$ analyzing imputation performance. Our methodology used a gaussian mixture model whose parameters were learned from a cleaned subset of a real demographic and health dataset to generate the synthetic data. We simulated various missingness degrees ranging from $10 \%$, $20 \%$, $30 \%$, and $40\%$ under the missing completely at random scheme MCAR. We used an integrated performance analysis framework involving clustering, classification and direct imputation analysis. Our results show that models trained on synthetic and imputed datasets could make predictions with an accuracy of $83 \%$ and $80 \%$ on $a) $ an unseen real dataset and $b)$ an unseen reserved synthetic test dataset, respectively. Moreover, the models that used the DAE method for imputed yielded the lowest log loss an indication of good performance, even though the accuracy measures were slightly lower. In conclusion, our work demonstrates that using our methodology, one can reverse engineer a solution to resolve missingness on an unseen dataset with missingness. Moreover, though we used a health dataset, our methodology can be utilized in other contexts.

translated by 谷歌翻译

Evaluating State of the Art, Forecasting Ensembles- and Meta-learning Strategies for Model Fusion

Pieter Cawood , Terence van Zyl

分类：机器学习 | 人工智能

2022-03-07

杂交和集合学习技术是改善预测方法的预测能力的流行模型融合技术。通过有限的研究，将这两种有前途的方法结合在一起，本文着重于不同合奏的基础模型池中指数平滑的旋转神经网络（ES-RNN）的实用性。我们将某些最先进的结合技术和算术模型平均作为基准进行比较。我们对M4预测数据集进行了100,000个时间序列，结果表明，基于特征的预测模型平均（FFORFORA）平均是与ES-RNN的晚期数据融合的最佳技术。但是，考虑到M4的每日数据子集，堆叠是处理所有基本模型性能相似的情况下唯一成功的合奏。我们的实验结果表明，与N-Beats作为基准相比，我们达到了艺术的预测结果。我们得出的结论是，模型平均比模型选择和堆叠策略更强大。此外，结果表明，提高梯度对于实施合奏学习策略是优越的。

translated by 谷歌翻译

A Statistics and Deep Learning Hybrid Method for Multivariate Time Series Forecasting and Mortality Modeling

Thabang Mathonsi , Terence L. van Zyl

分类：机器学习 | (统计)机器学习

2021-12-16

已经显示混合方法以在预测任务中以纯粹的统计和纯粹的深度学习方法优于预测，并定量与这些预测（预测间隔）的相关不确定性。一个示例是指数平滑复发性神经网络（ES-RNN），统计预测模型和经常性神经网络变体之间的混合。 ES-RNN在Makridakis-4预测竞争中实现了9.4 \％的绝对错误。这种改进和类似的混合模型的表现主要是仅在单变量数据集上展示。将混合预测方法应用于多变量数据的困难包括（$ i $）的高参数调整所涉及的高计算成本，用于与数据中固有的自动关联相关的模型（II $）挑战，以及（ $ iii $）在可能难以捕获的协变量之间的复杂依赖（交叉相关）。本文介绍了多变量指数平滑的长短短期记忆（MES-LSTM），对ES-RNN的广义多元扩展，克服了这些挑战。 MES-LSTM利用了矢量化实现。我们在2019年（Covid-19）发病率数据集的几种聚集冠状病毒病中测试MES-LSTM，并发现我们的混合方法在预测准确性和预测间隔建设下对纯统计和深度学习方法进行了一致的，显着改善。

translated by 谷歌翻译

Feature-weighted Stacking for Nonseasonal Time Series Forecasts: A Case Study of the COVID-19 Epidemic Curves

Pieter Cawood , Terence L. van Zyl

分类：机器学习 | 人工智能

2021-08-19

我们调查预测中的合奏技术，并检查其使用与Covid-19大流行早期类似的非季度时间系列的潜力。开发改进的预测方法是必不可少的，因为它们在关键阶段为组织和决策者提供数据驱动的决策。我们建议使用后期数据融合，使用两个预测模型的堆叠集合和两个元特征，并在初步预测阶段证明其预测力。最终的集合包括先知和长期短期内存（LSTM）神经网络作为基础模型。基础模型由多层的Perceptron（MLP）组合，考虑到元素，表示与每个基础模型的预测精度最高的相关性。我们进一步表明，包含Meta-Features通常会在七和十四天的两个预测视野中提高集合的预测准确性。该研究强化了以前的工作，并展示了与深层学习模型相结合的传统统计模型的价值，以生产更多来自不同领域和季节性的时间序列的预测模型。

translated by 谷歌翻译

Asynchronous Hybrid Reinforcement Learning for Latency and Reliability Optimization in the Metaverse over Wireless Communications

Wenhan Yu , Terence Jie Chua , Jun Zhao

分类：机器学习

2022-12-30

Technology advancements in wireless communications and high-performance Extended Reality (XR) have empowered the developments of the Metaverse. The demand for Metaverse applications and hence, real-time digital twinning of real-world scenes is increasing. Nevertheless, the replication of 2D physical world images into 3D virtual world scenes is computationally intensive and requires computation offloading. The disparity in transmitted scene dimension (2D as opposed to 3D) leads to asymmetric data sizes in uplink (UL) and downlink (DL). To ensure the reliability and low latency of the system, we consider an asynchronous joint UL-DL scenario where in the UL stage, the smaller data size of the physical world scenes captured by multiple extended reality users (XUs) will be uploaded to the Metaverse Console (MC) to be construed and rendered. In the DL stage, the larger-size 3D virtual world scenes need to be transmitted back to the XUs. The decisions pertaining to computation offloading and channel assignment are optimized in the UL stage, and the MC will optimize power allocation for users assigned with a channel in the UL transmission stage. Some problems arise therefrom: (i) interactive multi-process chain, specifically Asynchronous Markov Decision Process (AMDP), (ii) joint optimization in multiple processes, and (iii) high-dimensional objective functions, or hybrid reward scenarios. To ensure the reliability and low latency of the system, we design a novel multi-agent reinforcement learning algorithm structure, namely Asynchronous Actors Hybrid Critic (AAHC). Extensive experiments demonstrate that compared to proposed baselines, AAHC obtains better solutions with preferable training time.

translated by 谷歌翻译

E-NER -- An Annotated Named Entity Recognition Corpus of Legal Text

Ting Wai Terence Au , Ingemar J. Cox , Vasileios Lampos

分类：自然语言处理

2022-12-19

Identifying named entities such as a person, location or organization, in documents can highlight key information to readers. Training Named Entity Recognition (NER) models requires an annotated data set, which can be a time-consuming labour-intensive task. Nevertheless, there are publicly available NER data sets for general English. Recently there has been interest in developing NER for legal text. However, prior work and experimental results reported here indicate that there is a significant degradation in performance when NER methods trained on a general English data set are applied to legal text. We describe a publicly available legal NER data set, called E-NER, based on legal company filings available from the US Securities and Exchange Commission's EDGAR data set. Training a number of different NER algorithms on the general English CoNLL-2003 corpus but testing on our test collection confirmed significant degradations in accuracy, as measured by the F1-score, of between 29.4\% and 60.4\%, compared to training and testing on the E-NER collection.

translated by 谷歌翻译

Unified, User and Task (UUT) Centered Artificial Intelligence for Metaverse Edge Computing

Terence Jie Chua , Wenhan Yu , Jun Zhao

分类：人工智能 | 机器学习

2022-12-19

The Metaverse can be considered the extension of the present-day web, which integrates the physical and virtual worlds, delivering hyper-realistic user experiences. The inception of the Metaverse brings forth many ecosystem services such as content creation, social entertainment, in-world value transfer, intelligent traffic, healthcare. These services are compute-intensive and require computation offloading onto a Metaverse edge computing server (MECS). Existing Metaverse edge computing approaches do not efficiently and effectively handle resource allocation to ensure a fluid, seamless and hyper-realistic Metaverse experience required for Metaverse ecosystem services. Therefore, we introduce a new Metaverse-compatible, Unified, User and Task (UUT) centered artificial intelligence (AI)- based mobile edge computing (MEC) paradigm, which serves as a concept upon which future AI control algorithms could be built to develop a more user and task-focused MEC.

translated by 谷歌翻译

GET-DIPP: Graph-Embedded Transformer for Differentiable Integrated Prediction and Planning

Jiawei Sun , Chengran Yuan , Shuo Sun , Zhiyang Liu , Terence Goh , Anthony Wong , Keng Peng Tee , Marcelo H. Ang Jr

分类：机器人

2022-11-11

Accurately predicting interactive road agents' future trajectories and planning a socially compliant and human-like trajectory accordingly are important for autonomous vehicles. In this paper, we propose a planning-centric prediction neural network, which takes surrounding agents' historical states and map context information as input, and outputs the joint multi-modal prediction trajectories for surrounding agents, as well as a sequence of control commands for the ego vehicle by imitation learning. An agent-agent interaction module along the time axis is proposed in our network architecture to better comprehend the relationship among all the other intelligent agents on the road. To incorporate the map's topological information, a Dynamic Graph Convolutional Neural Network (DGCNN) is employed to process the road network topology. Besides, the whole architecture can serve as a backbone for the Differentiable Integrated motion Prediction with Planning (DIPP) method by providing accurate prediction results and initial planning commands. Experiments are conducted on real-world datasets to demonstrate the improvements made by our proposed method in both planning and prediction accuracy compared to the previous state-of-the-art methods.

translated by 谷歌翻译

Resource Allocation for Mobile Metaverse with the Internet of Vehicles over 6G Wireless Communications: A Deep Reinforcement Learning Approach

Terence Jie Chua , Wenhan Yu , Jun Zhao

分类：机器学习

2022-09-27

改善人与人之间的互动性和互连性是元视频的亮点之一。荟萃分析依赖于核心方法，数字孪生，这是将物理世界对象，人，动作和场景复制到虚拟世界中的一种手段。能够在实时和移动性的情况下访问与物理世界相关的场景和信息，对于为所有用户开发高度可访问，互动和互连体验至关重要。这种开发使来自其他位置的用户可以访问有关另一个位置发生的事件的高质量现实世界和最新信息，并与他人进行超相互交流的社交。然而，由于虚拟世界图形的数据大小以及对低延迟传输的需求，因此其他人从元评估中产生的持续，平稳的更新是一项具有挑战性的任务。随着移动增强现实（MAR）的开发，用户也可以通过高度交互方式（即使在移动性下）通过元视频进行交互。因此，在我们的工作中，我们考虑了一个环境，其中包括移动车辆互联网（IOV）的用户，并通过无线通信从Metaverse Service Provister Pasting Stations（MSPCSS）下载实时虚拟世界更新。我们设计了一个具有多个单元站的环境，其中将在细胞站之间交换用户虚拟世界图形下载任务。由于传输延迟是在移动性下接收虚拟世界更新的主要关注点，因此我们的工作旨在分配系统资源，以最大程度地减少用户在车辆中使用的总时间，以便从单元站下载其虚拟世界场景。我们利用深度强化学习并评估不同环境配置下算法的性能。我们的工作提供了启用AI支持的6G通信的元视体的用例。

translated by 谷歌翻译

Transition to Adulthood for Young People with Intellectual or Developmental Disabilities: Emotion Detection and Topic Modeling

Yan Liu , Maria Laricheva , Chiyu Zhang , Patrick Boutet , Guanyu Chen , Terence Tracey , Giuseppe Carenini , Richard Young

分类：自然语言处理 | (统计)机器学习

2022-09-21

过渡到成年是许多家庭的重要生活阶段。先前的研究表明，具有智力或发展的年轻人（IDD）比同龄人面临的挑战更多。这项研究是为了探索如何使用自然语言处理（NLP）方法，尤其是无监督的机器学习，以帮助心理学家分析情绪和情感，并使用主题建模来确定年轻人IDD及其家人所拥有的常见问题和挑战。此外，将结果与从没有IDD的年轻人那里获得的结果进行了比较。研究结果表明，NLP方法对于心理学家分析情绪，进行跨案例分析并从对话数据中汇总关键主题非常有用。我们的Python代码可在https://github.com/mlaricheva/emotion_topic_modeling上找到。

translated by 谷歌翻译